1 Abstract

A (very) short summary of the report. As an example, you can simply write one sentence for each section in the report.

2 Introduction

The scale of complexity a a human brain has over that of a mouse is well known among neuroscientists. The structure of a mouse brain is very similar in structure and function to a human brain, thus making them an interest of study in neurological testing. In this analysis, we look at data from a recent experiment by Steinmetz et al. (2019) which observed the activity of neurons in the visual cortex of mice when presented with stimuli. Following this, the mice were required to make a decision, using a wheel which was controlled by their forepaws. Their decision foretold whether they received an award or a penalty.

3 Background

The data set we will be using only looks at two of the ten mice in the study, Cori and Forssmann from mid-December 2016 to early November 2017. Both mice had varying number of trials, with 39 tracked times for each trial. Since these five sessions have varying amounts of trials, we look for a way to minimize the data from these five sessions into one long data set. To do this, we first must create an accurate response variable for our model, one that we can compare imbalanced trials between sessions. We choose to compare the mean firing rates for each trial in each session. This measure would be the average number of spikes per seconds across all neurons measuring within a 0.4 second interval. The choice of 0.4 seconds as our interval is by design of the experiment, which focused on spike trains of neurons from the presenting of the stimuli to 0.4 seconds after. The resulting data frame, from the five merged sessions, is shown below. We have a data set of 1196 trials, spanning across the 5 sessions, with the mean firing rate, the left and right contrasts, feedback type, and an ID column for which session the trial came from.

ses1 <- readRDS("./Data/session1.rds")
ses2 <- readRDS("./Data/session2.rds")
ses3 <- readRDS("./Data/session3.rds")
ses4 <- readRDS("./Data/session4.rds")
ses5 <- readRDS("./Data/session5.rds")


session=list()
for(i in 1:5){
  session[[i]]=readRDS(paste('./Data/session',i,'.rds',sep=''))
  #print(session[[i]]$mouse_name)
  #print(session[[i]]$date_exp)
}


# id=11
# session[[5]]$feedback_type[id]
# session[[5]]$contrast_left[id]
# session[[5]]$contrast_right[id]
# length(session[[1]]$time[[id]])
# dim(session[[5]]$spks[[id]])

#ID=1
t=0.4 # from Background 

# Rows of contrast is quantity of trials (varying)
# rows of spikes/times is quantity of neurons (varying)
# columns of spikes/times is tracked sessions of brain (fixed)
# 
# n.trials=length(session[[ID]]$spks)
# # Number of nuerons is the rows of spikes. which varies 
# n.neurons=dim(session[[ID]]$spks[[1]])[1]
# 
# # Obtain the firing rate 
# firingrate=numeric(n.trials)
# for(i in 1:n.trials){
#   firingrate[i]=sum(session[[ID]]$spks[[i]])/n.neurons/t
# }
# firingrate

firingrate <- c()
total.fire.rate <- sapply(1:5, function(i)
  sapply(1:length(session[[i]]$spks), function(j)
    firingrate = c(firingrate, sum(session[[i]]$spks[[j]]) / dim(session[[i]]$spks[[j]])[1] / t)))

# Flatten the list of lists
rates.flat <- (unlist(total.fire.rate))
mice <- data.frame("Firing.Rate" = rates.flat)

# Now create dataframe to join with session as a variable as well
contrast.dat <- data.frame()
abc <- sapply(1:5, function(i)
  cbind(
    session[[i]]$contrast_left,
    session[[i]]$contrast_right,
    session[[i]]$feedback_type,
    rep(i, length(session[[i]]$contrast_left))
  ))
contrast.dat <- do.call(rbind, abc)

# Merge data.frames
mice <- cbind(mice, contrast.dat)
colnames(mice) <-
  c("Firing.Rate", "C.Left", "C.Right", "Feedback_Type", "Session")

# Convert mice Session into factor
cols2fac <- c("C.Left", "C.Right", "Feedback_Type", "Session")
mice[cols2fac] <- lapply(mice[cols2fac], factor)
mice

4 Descriptive analysis

Before we begin our analysis, we will report some summary statistics of the data as well as visualizations to understand our data with more depth. Below we print a five number summary of our response, the mean firing rate. We can see from the table that the lowest mean firing rate is 0.404 and the largest is 7.219. However, with a median of 2.962, we speculate the distribution of our mean firing rate will have some skew, which we will plot below colored by Feedback Type and Session number.

kable(as.data.frame(c(summary(mice$Firing.Rate))))
c(summary(mice$Firing.Rate))
Min. 0.4040404
1st Qu. 1.9166667
Median 2.9620075
Mean 2.8579374
3rd Qu. 3.6913696
Max. 7.2191011
# Create Histograms,
# separated by Session
ggplotly(
  ggplot(mice, aes(x = Firing.Rate, fill = Session)) + geom_histogram(
    bins = 22,
    alpha = 0.8,
    position = "identity"
  )  + theme_igray() + labs(
    x = "Firing Rate",
    y = "Count",
    fill = "Session",
    title = "Histograms of Firing Rates Colored by Session"
  )
)
# Separated by Feedback Type
ggplotly(
  ggplot(mice, aes(x = Firing.Rate, fill = Feedback_Type)) + geom_histogram(
    position = "dodge",
    bins = 15,
    alpha = 0.9
  ) + theme_igray() + scale_fill_manual(values = c("#56B4E9", "#E69F00", "#999999")) + labs(
    x = "Firing Rate",
    y = "Count",
    fill = "Feedback Type",
    title = "Histograms of Firing Rates Colored by Feedback Type"
  )
)
# Separated by Feedback Type, NOT OUTPUTTING
# ggplotly(ggplot(mice, aes(x = Firing.Rate, fill = Feedback_Type)) + geom_histogram(position = "identity", bins = 22, alpha = 0.5) + scale_fill_manual(values=c("#E69F00","#999999", "#56B4E9")))

The histograms of our firing rates show us with each session performed, the distribution of firing rates shifts towards zero more and more. The session with the highest firing rate is session 1 and the lowest firing rate is session 5. Using the interactive plot to isolate each histogram by session, we can also see that each session is heavily right skewed on its own, except for Session 2. This session appears to have more symmetry, with heavier tails. Separating by Feedback Type, we can see more mice were given rewards than penalties for all firing rates except those who had firing rates between 0.75 and 1, where there was just one more ice penalized than rewarded.

Overall from these distributions, we can see our earlier suspicion was correct regarding skew. The left portion of the histograms has more observations than the right tails. While this skew is apparent, it is not a strong departure, visually at least, from normality, which we will assess Sensitivity Analysis.

# Box Plots 
# For Left Contrast
#ggplot(mice, aes(x = C.Left, y= Firing.Rate, fill = C.Left)) + geom_boxplot()
# Right Contrast
#ggplot(mice, aes(x = C.Right, y= Firing.Rate, fill = C.Right)) + geom_boxplot()

5 Inferential analysis

Our goal is to build a mixed effects model for the firing rates of neurons with fixed effects contrast left \(\alpha_i\), contrast right \(\beta_j\), their interaction \(\alpha\beta_{ij}\), and the random effect \(\gamma_k\) of each session. We have the following model:

\[ Y_{ijkl} = \mu + \alpha_i + \beta_j + \alpha\beta_{ij} + \gamma_k + \epsilon_{ijkl} \] where we have:\[ i = \{0, 0.25, 0.5, 1\} \\ j = \{0, 0.25, 0.5, 1\} \\ k = \{1 \ldots5\} \\ l = \{1 \ldots n_k\} \] With this model, we have a few constraints,\(\sum_i n_i\alpha_i = 0\), \(\sum_j n_j\beta_j = 0\), \(\sum_{ij} n_{ij}\alpha\beta_{ij} = 0\), \(\gamma \sim N(0, \sigma_{\gamma}^2)\), and \(\epsilon \sim N(0,\sigma^2)\). With fixed effects of both the left and right contrasts, we aim to find how they affect the mean firing rate for all neurons. However, because we are comparing these neuron mean firing rates across multiple sessions, we must include a random effect measure to account for all non-fixed effects that take place in switching sessions. This random effect, as well as our error, are both normally distributed.

We now build our model, using lme4 library, which will allow us to create the intercept \(\gamma_k\) for each session in the model. Since we will be performing model comparisons between fixed effects, we will abstain from using the restricted maximum likelihood estimation (Oehlert 4). We first build a full model, as described above, where we receive the following summary below.

lm1 = lmer(Firing.Rate ~ C.Left * C.Right + (1 |Session),
           data = mice,
           REML = F)

# plot(lm1) save for sensitivity analysis
sum.lm1 <- summary(lm1)
kable(sum.lm1$coefficients)
Estimate Std. Error t value
(Intercept) 2.6440266 0.4514576 5.8566440
C.Left0.25 0.1230864 0.1151461 1.0689583
C.Left0.5 0.3297409 0.0774725 4.2562308
C.Left1 0.4181565 0.0790000 5.2931230
C.Right0.25 0.3294402 0.0957057 3.4422197
C.Right0.5 0.3104181 0.0771486 4.0236369
C.Right1 0.4755159 0.0655214 7.2574085
C.Left0.25:C.Right0.25 -0.2867070 0.1861141 -1.5404908
C.Left0.5:C.Right0.25 -0.3270568 0.1543957 -2.1183028
C.Left1:C.Right0.25 -0.4207931 0.1394203 -3.0181631
C.Left0.25:C.Right0.5 -0.2243095 0.1666453 -1.3460297
C.Left0.5:C.Right0.5 -0.2801156 0.1520929 -1.8417404
C.Left1:C.Right0.5 -0.3443762 0.1486681 -2.3164097
C.Left0.25:C.Right1 -0.3084436 0.1447033 -2.1315595
C.Left0.5:C.Right1 -0.1159386 0.1418182 -0.8175153
C.Left1:C.Right1 -0.1903870 0.1450161 -1.3128678
#ranef(lm1)
#predict(lm1, )

From the summary, we can see from our fixed effects that as contrasts levels increase, the estimated mean firing rate increases. However, when we hold interactions between the left and right contrasts, the estimates lower, with all interactions being negative. We test for significance using the likelihood ratio test between our full interaction model and one with no interactions.

# Now build a model with no interactions 
lm2 = lmer(Firing.Rate ~ C.Left + C.Right + (1 |Session),
           data = mice,
           REML = F)
summary(lm2)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: Firing.Rate ~ C.Left + C.Right + (1 | Session)
##    Data: mice
## 
##      AIC      BIC   logLik deviance df.resid 
##   2349.3   2395.1  -1165.7   2331.3     1187 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.8457 -0.7083 -0.1206  0.6508  5.2257 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Session  (Intercept) 1.0189   1.0094  
##  Residual             0.4003   0.6327  
## Number of obs: 1196, groups:  Session, 5
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  2.69990    0.45252   5.966
## C.Left0.25  -0.06272    0.05531  -1.134
## C.Left0.5    0.21943    0.05318   4.126
## C.Left1      0.24160    0.05155   4.687
## C.Right0.25  0.12001    0.05538   2.167
## C.Right0.5   0.18721    0.05419   3.455
## C.Right1     0.38039    0.04820   7.892
## 
## Correlation of Fixed Effects:
##             (Intr) C.L0.2 C.L0.5 C.Lft1 C.R0.2 C.R0.5
## C.Left0.25  -0.018                                   
## C.Left0.5   -0.026  0.256                            
## C.Left1     -0.025  0.271  0.286                     
## C.Right0.25 -0.023 -0.156 -0.124 -0.249              
## C.Right0.5  -0.028 -0.152 -0.051 -0.074  0.289       
## C.Right1    -0.032 -0.252 -0.027 -0.030  0.313  0.332
anova(lm1, lm2)
# From the likelihood ratio test we can see that with a p-value of 0.05, we reject the null, that there is no difference in the models, therefore we keep the full model 


### PERTAINING TO THE RANDOM EFFECT TESTING 
# The 3d Scatter will help visualize the difference between the random effects

# Set up data by predictions for 3d Scatter (not including heatmap, not as useful)
pred.mice <-
  expand.grid(x = unique(mice$C.Left), y = unique(mice$C.Right)) # All permutations of contrasts
pred.mice <- pred.mice %>% slice(rep(row_number(), 5))
pred.mice <-
  cbind(pred.mice, c(sapply(1:5, function (x)
    rep(x, 16)))) # combine with session numbers
colnames(pred.mice) <- c("C.Left", "C.Right", "Session")
pred.mice <-
  pred.mice %>% mutate_if(is.numeric, as.factor) # convert to factors
pred.mice <-
  cbind(pred.mice, predict(lm1, pred.mice)) # add predictions to df
colnames(pred.mice) <-
  c("C.Left", "C.Right", "Session", "Pred")  # add column names
# ggplotly(ggplot(pred.mice, aes(
#   x = C.Left, y = C.Right, fill = Pred
# )) + geom_tile() + facet_wrap( ~ Session, ncol = 2))

# Lets try with plotly scatter
l <- list(
  font = list(
    family = "sans-serif",
    size = 12,
    color = "#000"
  ),
  bgcolor = "#E2E2E2",
  bordercolor = "#FFFFFF",
  borderwidth = 2
)
plot_ly(
  x = pred.mice$C.Left,
  y = pred.mice$C.Right,
  z = pred.mice$Pred,
  type = "scatter3d",
  mode = "markers",
  color = pred.mice$Session
) %>% layout(
  scene = list(
    xaxis = list(title = "Left Contrast"),
    yaxis = list(title = "Right Contrast"),
    zaxis = list(title = "Mean Firing Rate")
  ),
  legend = list(title = list(text = '<b> Session </b>'))
) %>% layout(legend = l)
# Test for model with no random effect
lm.noRando <- lm(Firing.Rate ~ C.Left * C.Right,
           data = mice)
anova(lm.noRando)
# Likelihood ratio test 
anova(lm1, lm.noRando)
# Random effect has very small p-value, so we also reject and continue with our full mixed effect model

6 Sensitivity Analysis

6.1 Outliers

# Before checking Normality and Homoscedasticity, lets check for outliers
cookD.lm1 <- cooks.distance(lm1)
row.ind <- seq(1, nrow(mice), 1)
ggplotly(
  ggplot() + aes(x = row.ind, y = cookD.lm1) + geom_point() + labs(y = "Cooks Distance", x = "Observation Number", title = "Plot of Cooks Distance vs Observation Number for Mixed Effect Model")
)
ggplotly(
  ggplot() + aes(x = row.ind, y = sum.lm1$residuals) + geom_point() + labs(y = "Cooks Distance", x = "Observation Number", title = "Plot of Residuals vs Observation Number for Mixed Effect Model")
)
# From the plot we can see that observations 10, 806, and 3 have the highest Cook's distance, lets see what stands out
# 44 has the largest residual value 
mice[c(3,10,44, 806),]
# Observation 10 has a firing rate of 6.66, when the average firing rate is around 4, Observation 3 is also a session 1 with a high firing rate, however it is not as drastic as observation 10
# 44 is an extremely high mean firing rate for session 1
# Observation 806 has an abnormally high firing rate for the left and right contrast it has in its session, the predicted value for a cleft 0.25 and cright 0 is 2.183 where observation 806 is 3.85 firing rate

# We will remove these rows, rebuild the model, and check assumptions
mice.red <-  mice%>%
  filter(!row_number() %in% c(3, 10,44, 806))

# build model with outliers removed
lm1.red = lmer(Firing.Rate ~ C.Left * C.Right + (1 |Session),
           data = mice.red,
           REML = F)
sum.lm1.red <- summary(lm1.red)
# Continue with sensitivity analysis

6.2 Normality

# Plot a histogram of the residuals 
ggplotly(
  ggplot() + aes(x = sum.lm1.red$residuals) + geom_histogram(bins = 20, fill = "#E69F00") + theme_igray() + labs(x = "Pearson Residuals", y = "Count", title = "Distribution of Residuals for Full Mixed Effect Model")
)
## Warning: The following aesthetics were dropped during statistical transformation:
## x_plotlyDomain
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
# Shapiro.test
shapiro.test(sum.lm1.red$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  sum.lm1.red$residuals
## W = 0.99007, p-value = 3.287e-07

6.3 Homoscedasticity

ggplotly(
  ggplot() + aes(x = predict(lm1.red), y = sum.lm1.red$residuals) + geom_point() + theme_igray() + stat_spline(
    spar = 0.95,
    size = 0.8,
    color = "#cd534cff"
  ) + labs(x = "Fitted Values", y = "Pearson Residuals", title = "Pearson Residuals vs. Fitted Values for the Final Model")
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## Warning: The following aesthetics were dropped during statistical transformation:
## x_plotlyDomain, y_plotlyDomain
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
head(mice.red)
leveneTest(Firing.Rate ~ C.Left*C.Right, data = mice.red)
# We can see from our Levene Test and the fitted values vs residuals the assumption is met

7 Predictive Modeling

# We now focus on building a logistic regression model for Feedback type, we will start with full data 
mice.logit <-
  glm(Feedback_Type ~ C.Left + C.Right + Session,
      data = mice,
      family = "binomial")
summary(mice.logit)
## 
## Call:
## glm(formula = Feedback_Type ~ C.Left + C.Right + Session, family = "binomial", 
##     data = mice)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7091  -1.3261   0.7633   0.9667   1.2175  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.15575    0.17778   6.501 7.98e-11 ***
## C.Left0.25  -0.32590    0.18147  -1.796 0.072508 .  
## C.Left0.5   -0.26937    0.17758  -1.517 0.129299    
## C.Left1     -0.08745    0.17429  -0.502 0.615826    
## C.Right0.25 -0.74861    0.18307  -4.089 4.33e-05 ***
## C.Right0.5  -0.63755    0.18018  -3.538 0.000402 ***
## C.Right1    -0.47784    0.16297  -2.932 0.003368 ** 
## Session2    -0.17510    0.19906  -0.880 0.379063    
## Session3    -0.07145    0.20489  -0.349 0.727290    
## Session4     0.04070    0.20132   0.202 0.839789    
## Session5     0.02000    0.19940   0.100 0.920085    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1539.1  on 1195  degrees of freedom
## Residual deviance: 1504.2  on 1185  degrees of freedom
## AIC: 1526.2
## 
## Number of Fisher Scoring iterations: 4

8 Discussion

Conclude your analysis in this section. You can touch on the following topics.

  • A brief recap of this project.
  • Suggestions for future research and/or policy making given your findings.
  • Caveats of the current analysis.

9 Predictive Modeling

Acknowledgement

By default, it is assumed that you have discussed this project with your teammates and instructors. List any other people that you have discussed this project with.

Reference

List any references you cited in the report. See here for the APA format, as an example:

Imbens, G., & Rubin, D. (2015). Stratified Randomized Experiments. In Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction (pp. 187-218). Cambridge: Cambridge University Press. doi:10.1017/CBO9781139025751.010

Session info

Report information of your R session for reproducibility.

sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] car_3.0-13       carData_3.0-5    ggformula_0.10.2 ggridges_0.5.4  
##  [5] scales_1.2.0     ggstance_0.3.6   ggthemes_4.2.4   dplyr_1.0.10    
##  [9] knitr_1.40       lme4_1.1-29      Matrix_1.5-1     plotly_4.10.0   
## [13] ggplot2_3.4.0   
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.3         sass_0.4.1         tidyr_1.2.0        jsonlite_1.8.0    
##  [5] viridisLite_0.4.0  splines_4.2.2      bslib_0.3.1        assertthat_0.2.1  
##  [9] highr_0.9          yaml_2.3.5         pillar_1.7.0       lattice_0.20-45   
## [13] glue_1.6.2         digest_0.6.29      RColorBrewer_1.1-3 polyclip_1.10-0   
## [17] minqa_1.2.4        colorspace_2.0-3   htmltools_0.5.2    pkgconfig_2.0.3   
## [21] labelled_2.10.0    haven_2.5.0        purrr_0.3.4        tweenr_1.0.2      
## [25] ggforce_0.3.3      tibble_3.1.7       generics_0.1.2     farver_2.1.0      
## [29] ellipsis_0.3.2     withr_2.5.0        lazyeval_0.2.2     cli_3.3.0         
## [33] magrittr_2.0.3     crayon_1.5.1       evaluate_0.15      fansi_1.0.3       
## [37] nlme_3.1-160       MASS_7.3-58.1      forcats_1.0.0      tools_4.2.2       
## [41] data.table_1.14.2  hms_1.1.1          lifecycle_1.0.3    stringr_1.4.0     
## [45] munsell_0.5.0      compiler_4.2.2     jquerylib_0.1.4    rlang_1.0.6       
## [49] grid_4.2.2         nloptr_2.0.2       rstudioapi_0.13    htmlwidgets_1.5.4 
## [53] crosstalk_1.2.0    mosaicCore_0.9.2.1 labeling_0.4.2     rmarkdown_2.14    
## [57] boot_1.3-28        gtable_0.3.0       abind_1.4-5        DBI_1.1.2         
## [61] R6_2.5.1           fastmap_1.1.0      utf8_1.2.2         stringi_1.7.6     
## [65] Rcpp_1.0.8.3       vctrs_0.5.0        tidyselect_1.1.2   xfun_0.31